According to wikipedia, a cryptocurrency is a digital asset designed to work as a medium of exchange using "cryptography" to secure the transactions and to control the creation of additional units of the currency by "mining".
While the best-known example of a cryptocurrency is Bitcoin, there are more than 100 other tradable cryptocurrencies, called altcoins (meaning alternative to Bitcoin), competing each other and with Bitcoin.
The motive behind this competition is that there are a number of design flaws in Bitcoin, and people are trying to invent new coins to overcome these defects hoping their inventions will eventually replace Bitcoin.
To June 2017, the total market capital of all cryptocurrencies is 102 billion in USD, 41 of which is of Bitcoin. Therefore, regardless of its design faults, Bitcoin is still the dominant cryptocurrency in markets. As a result, many altcoins cannot be bought with fiat currencies, but only be traded against Bitcoin.
Hence, I chose Bitcoin as my commodity in order to make wiser future investments for my cryptocurrency portfolio.
The ubiquity of Internet access has triggered the emergence of currencies distinct from those used in the prevalent monetary system. The advent of cryptocurrencies based on a unique method called “mining” has brought about significant changes in the online economic activities of users.
Cryptocurrencies are primarily characterized by fluctuations in their price and number of transactions [1][2]. Although Bitcoin was first introduced in 2008 [2][3], it had witnessed no significant fluctuation in its price and number of transactions until the end of 2013 [2], when it began to garner worldwide attention, and witnessed a significant rise and fluctuation in its price and number of transactions. Such unstable fluctuations have served as an opportunity for speculation for some users while hindering most others from using cryptocurrencies [1][4][5]
My research will follow a comparative approach. My first framework is a Recurrent Neural Network trained on 3 popular stock market indicators and past prices as key data points to find an optimal technique for cryptocurrency stock market prediction.
My second framework is a sequential model, trained on the sentiment of the public company news history and past prices as key data points, consisting of a single Long Short-Term Memory (LSTM) layer to generate a prediction vector for the whole input sequence and 1 Linear Dense Layer to aggregate the data into a single value.
Comparison will be made on the basis of their performance. Both techniques have some advantages and disadvantages. My research will analyze advantages and limitations of these techniques to find which technique is comparatively better for specifically Bitcoin stock market prediction.
In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of time steps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.
If the weights in this matrix are small (or, more formally, if the leading eigenvalue of the weight matrix is smaller than 1.0), it can lead to a situation called vanishing gradients where the gradient signal gets so small that learning either becomes very slow or stops working altogether. It can also exacerbate the task of learning long-term dependencies in the data. Conversely, if the weights in this matrix are large (or, again, more formally, if the leading eigenvalue of the weight matrix is larger than 1.0), it can lead to a situation where the gradient signal is so large that it can cause learning to diverge. This is often referred to as exploding gradients.
These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell. A memory cell is composed of four main elements: an input gate, a neuron with a self-recurrent connection (a connection to itself), a forget gate and an output gate. The self-recurrent connection has a weight of 1.0 and ensures that, barring any outside interference, the state of a memory cell can remain constant from one time step to another.

The gates serve to modulate the interactions between the memory cell itself and its environment. The input gate can allow incoming signal to alter the state of the memory cell or block it. On the other hand, the output gate can allow the state of the memory cell to have an effect on other neurons or prevent it. Finally, the forget gate can modulate the memory cell’s self-recurrent connection, allowing the cell to remember or forget its previous state, as needed.
I believe the biggest difference between the NLP and financial analysis is that language has some guarantee of structure, it’s just that the rules of the structure are vague. Markets, on the other hand, don’t come with a promise of a learnable structure, that such a structure exists is the assumption that this project would prove or disprove (rather it might prove or disprove if I can find that structure).
Assuming that a structure exists; the idea of summarizing the current state of the market in the same way we encode the semantics of a paragraph seems plausible to me.
How do Bitcoin markets behave? What are the causes of the sudden spikes and dips in cryptocurrency values? How can we predict what will happen next?
Research on the attributes of cryptocurrencies has made steady progress but has a long way to go. Most researchers analyze user sentiments related to cryptocurrencies on social media, e.g., Twitter, or quantified Web search queries on search engines, such as Google, as well as fluctuations in price and trade volume to determine any relation [6–10]. Past studies have been limited to Bitcoin because the large amount of data that it provides eliminates the need to build a model to predict fluctuations in the price and number of transactions of diverse cryptocurrencies.
Articles on cryptocurrencies, such as Bitcoin, are rife with speculation these days, with hundreds of self-proclaimed experts advocating for the trends that they expect to emerge. What is lacking from many of these analyses is a strong data analysis foundation to backup the claims.
So, I felt that analysis of the top headlines on the first page of Google News results for the term Bitcoin to predict its closing price for the next day seemed like the most unbiased approach to resolving the biased opinions strewn around the web. I also felt exclusion of the "Price" suffix was justified by the fact that its inclusion led to Google News returning those biased news articles as opposed to just news revolving around Bitcoin that I feel is more relevant for my unbiased prediction of closing prices.
This approach also resonates with my personal approach to track the closing prices of the cryptocurrencies I have invested in. I always find myself skimming through the most important headlines on the first page of the Google News results. Extrapolation of my research method to apply to a sample of the population representing crytocurrency investors also seems fair. Don't agree? This article on Bloomberg Markets does!
In a nutshell, the article suggests that, according to Google Trends, global searches for “buy bitcoin” have overtaken “buy gold” after previously exceeding searches for how to purchase silver.
Note - Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. Likewise a score of 0 means the term was less than 1% as popular as the peak.

For the rest of the skeptics, I decided to explore Google Trends myself to evaluate the legitimacy of my claim. So the global search for bitcoin for the time period starting from January 3, 2009; when the first Bitcoin transaction record, or genesis block, kicked off the Bitcoin blockchain and included a reference to a pertinent newspaper headline of that day:
The Times 03/Jan/2009 Chancellor on brink of second bailout for banks.
Sources:
looks like this:

My initial features set include the Adjusted Open, Adjusted High, Adjusted Low, Adjusted Close, Adjusted Volume for BTC, Adjusted Volume for Currency and Weighted Price for Bitcoin retrieved using Quandl's free Bitcoin API for dates ranging from January 7, 2017 to December 12, 2017.
I used pickle to serialize and save the downloaded data as a file, which will prevent my script from re-downloading the same data each time I run the script.
# Define Quandl Helper Function to download and cache bitcoin dataset from Quandl
import json
import numpy as np
import pandas as pd
import pickle
import quandl
from datetime import datetime
import plotly.offline as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
py.init_notebook_mode(connected=True)
with open('api_key.json') as f:
api = json.load(f)
quandl.ApiConfig.api_key = api["api_key"]
def get_quandl_data(quandl_id):
#Download and cache Quandl dataseries
cache_path = '{}.pkl'.format(quandl_id).replace('/','-')
try:
f = open(cache_path, 'rb')
df = pickle.load(f)
print('Loaded {} from cache'.format(quandl_id))
except (OSError, IOError) as e:
print('Downloading {} from Quandl'.format(quandl_id))
df = quandl.get(quandl_id, returns="pandas", start_date="2014-01-07", end_date="2017-12-12")
df.to_pickle(cache_path)
print('Cached {} at {}'.format(quandl_id, cache_path))
return df
# Pull Kraken BTC exchange historical pricing data
btc_usd_price_kraken = get_quandl_data('BCHARTS/KRAKENUSD')
btc_usd_price_kraken.head()
# Chart the BTC close pricing data
btc_trace = go.Scatter(x=btc_usd_price_kraken.index, y=btc_usd_price_kraken['Close'])
py.iplot([btc_trace])
There are a few notable down-spikes, particularly in late 2014 and early 2016. These spikes are specific to the Kraken dataset, and I obviously don't want them to be reflected in my overall pricing analysis.
The nature of Bitcoin exchanges is that the pricing is determined by supply and demand, hence no single exchange contains a true "master price" of Bitcoin. To solve this issue, along with that of down-spikes, I pulled data from three more major Bitcoin changes to calculate an aggregate Bitcoin price index.
# Pull pricing data for 3 more BTC exchanges
exchanges = ['COINBASE','BITSTAMP','ITBIT']
exchange_data = {}
exchange_data['KRAKEN'] = btc_usd_price_kraken
for exchange in exchanges:
exchange_code = 'BCHARTS/{}USD'.format(exchange)
btc_exchange_df = get_quandl_data(exchange_code)
exchange_data[exchange] = btc_exchange_df
# Merge All Of The Pricing Data Into A Single Dataframe
def merge_dfs_on_column(dataframes, labels, col):
# Merge a single column of each dataframe into a new combined dataframe
series_dict = {}
for index in range(len(dataframes)):
series_dict[labels[index]] = dataframes[index][col]
return pd.DataFrame(series_dict)
# Merge the BTC price dataseries' into a single dataframe on their "Close Price" column
btc_usd_datasets_close = merge_dfs_on_column(list(exchange_data.values()), list(exchange_data.keys()), 'Close')
btc_usd_datasets_open = merge_dfs_on_column(list(exchange_data.values()), list(exchange_data.keys()), 'Open')
btc_usd_datasets_high = merge_dfs_on_column(list(exchange_data.values()), list(exchange_data.keys()), 'High')
btc_usd_datasets_low = merge_dfs_on_column(list(exchange_data.values()), list(exchange_data.keys()), 'Low')
btc_usd_datasets_close.tail()
# Visualize The Pricing Datasets
# Helper function to provide a single-line command to compare each column in the dataframe
def df_scatter(df, title, seperate_y_axis=False, y_axis_label='', scale='linear', initial_hide=False):
#Generate a scatter plot of the entire dataframe
label_arr = list(df)
series_arr = list(map(lambda col: df[col], label_arr))
layout = go.Layout(
title=title,
legend=dict(orientation="h"),
xaxis=dict(type='date'),
yaxis=dict(
title=y_axis_label,
showticklabels= not seperate_y_axis,
type=scale
)
)
y_axis_config = dict(
overlaying='y',
showticklabels=False,
type=scale )
visibility = 'visible'
if initial_hide:
visibility = 'legendonly'
# Form Trace For Each Series
trace_arr = []
for index, series in enumerate(series_arr):
trace = go.Scatter(
x=series.index,
y=series,
name=label_arr[index],
visible=visibility
)
# Add seperate axis for the series
if seperate_y_axis:
trace['yaxis'] = 'y{}'.format(index + 1)
layout['yaxis{}'.format(index + 1)] = y_axis_config
trace_arr.append(trace)
fig = go.Figure(data=trace_arr, layout=layout)
py.iplot(fig)
# Plot all of the BTC exchange closing prices
df_scatter(btc_usd_datasets_close, 'Bitcoin Closing Price (USD) By Exchange')
Although the four series follow roughly the same path, there are various irregularities in each that should be eliminated. Since the price of Bitcoin has never been equal to zero in the timeframe that I was examining it makes sense to remove all of the zero values from the combined dataframe.
# Remove "0" values
btc_usd_datasets_close.replace(0, np.nan, inplace=True)
btc_usd_datasets_open.replace(0, np.nan, inplace=True)
btc_usd_datasets_high.replace(0, np.nan, inplace=True)
btc_usd_datasets_low.replace(0, np.nan, inplace=True)
# Plot the cleaned dataframe
df_scatter(btc_usd_datasets_close, 'Bitcoin Closing Price (USD) By Exchange')
# Calculate the average BTC closing price as a new column
btc_usd_datasets_close['avg_btc_close_price_usd'] = btc_usd_datasets_close.mean(axis=1)
btc_usd_datasets_open['avg_btc_open_price_usd'] = btc_usd_datasets_open.mean(axis=1)
btc_usd_datasets_high['avg_btc_high_price_usd'] = btc_usd_datasets_high.mean(axis=1)
btc_usd_datasets_low['avg_btc_low_price_usd'] = btc_usd_datasets_low.mean(axis=1)
# Plot the average BTC closing price
btc_trace = go.Scatter(x=btc_usd_datasets_close.index, y=btc_usd_datasets_close['avg_btc_close_price_usd'])
py.iplot([btc_trace])
btc_usd_datasets_close_final = btc_usd_datasets_close['avg_btc_close_price_usd'].copy()
btc_usd_datasets_open_final = btc_usd_datasets_open['avg_btc_open_price_usd'].copy()
btc_usd_datasets_high_final = btc_usd_datasets_high['avg_btc_high_price_usd'].copy()
btc_usd_datasets_low_final = btc_usd_datasets_low['avg_btc_low_price_usd'].copy()
btc_usd_datasets_close_final = btc_usd_datasets_close_final.reset_index()
btc_usd_datasets_open_final = btc_usd_datasets_open_final.reset_index()
btc_usd_datasets_high_final = btc_usd_datasets_high_final.reset_index()
btc_usd_datasets_low_final = btc_usd_datasets_low_final.reset_index()
btc_usd_datasets_open_final.columns = ['Date','Average Open Price (USD)']
btc_usd_datasets_high_final.columns = ['Date','Average High Price (USD)']
btc_usd_datasets_low_final.columns = ['Date','Average Low Price (USD)']
btc_usd_datasets_close_final.columns = ['Date','Average Close Price (USD)']
btc_usd_datasets_close_final.head()
btc_usd_datasets_final_1 = pd.merge(btc_usd_datasets_open_final, btc_usd_datasets_high_final, on='Date')
btc_usd_datasets_final_2 = pd.merge(btc_usd_datasets_low_final, btc_usd_datasets_close_final, on='Date')
btc_usd_datasets_final = pd.merge(btc_usd_datasets_final_1, btc_usd_datasets_final_2, on='Date')
btc_usd_datasets_final.to_csv('BTC_USD.csv', index=False)
btc_usd_datasets_final.head()
This features set builds out my dataframe from a csv file named BTC_USD.csv that I generated by extending the protocol governing the definition of my initial features set.
I am going to use 4 features: The close price and three extra technical indicators.
Exponential Moving Average: Is a type of infinite impulse response filter that applies weighting factors which decrease exponentially. The weighting for each older datum decreases exponentially, never reaching zero. It is similar to a simple moving average, except that more weight is given to the latest data. It's also known as the exponentially weighted moving average. This type of moving average reacts faster to recent price changes than a simple moving average. The 12- and 26-day EMAs are the most popular short-term averages and they are used to create indicators like the moving average convergence divergence (MACD) and the percentage price oscillator (PPO).

MACD: The Moving Average Convergence/Divergence oscillator (MACD) is one of the simplest and most effective momentum indicators available. The MACD turns two trend-following indicators, moving averages, into a momentum oscillator by subtracting the longer moving average from the shorter moving average.

Stochastics oscillator: The Stochastic Oscillator is a momentum indicator that shows the location of the close relative to the high-low range over a set number of periods. It measures whether the closing price of a security is closer to the high or the low. It is based on the assumption that when a market is trending upward, the closing price will be closer to the highest price, and, when it is trending downward, the closing price will be closer to the lowest price.

Average True Range: Is an indicator to measure the volalitility (NOT price direction). The true range indicator is the largest of:
The average true range is a moving average, generally 14 days, of the true ranges. Basically a stock experiencing a high level of volatility has a higher ATR, and a low volatility stock has a lower ATR.
Calculation:

def MACD(df,period1,period2,periodSignal):
EMA1 = pd.DataFrame.ewm(df,span=period1).mean()
EMA2 = pd.DataFrame.ewm(df,span=period2).mean()
MACD = EMA1-EMA2
Signal = pd.DataFrame.ewm(MACD,periodSignal).mean()
Histogram = MACD-Signal
return Histogram
def stochastics_oscillator(df,period):
l, h = pd.DataFrame.rolling(df, period).min(), pd.DataFrame.rolling(df, period).max()
k = 100 * (df - l) / (h - l)
return k
def ATR(df,period):
'''
Method A: Current High less the current Low
'''
df['H-L'] = abs(df['Average High Price (USD)']-df['Average Low Price (USD)'])
'''
Method B: Current High less the previous Close (absolute value)
'''
df['H-PC'] = abs(df['Average High Price (USD)']-df['Average Close Price (USD)'].shift(1))
'''
Method C: Current Low less the previous Close (absolute value)
'''
df['L-PC'] = abs(df['Average Low Price (USD)']-df['Average Close Price (USD)'].shift(1))
TR = df[['H-L','H-PC','L-PC']].max(axis=1)
return TR.to_frame()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.options.mode.chained_assignment = None
df = pd.read_csv('BTC_USD.csv',usecols=[1,2,3,4])
dfPrices = pd.read_csv('BTC_USD.csv',usecols=[4])
dfPrices.head(2)
price = dfPrices.iloc[len(dfPrices.index)-60:len(dfPrices.index)].as_matrix().ravel()
prices = dfPrices.iloc[len(dfPrices.index)-60:len(dfPrices.index)].as_matrix().ravel()
plt.figure(figsize=(25,7))
plt.plot(prices,label='Test',color='orange')
plt.title('Price')
plt.legend(loc='upper left')
plt.show()
macd = MACD(dfPrices.iloc[len(dfPrices.index)-60:len(dfPrices.index)],12,26,9)
plt.figure(figsize=(25,7))
plt.plot(macd,label='macd',color='blue')
plt.title('MACD')
plt.legend(loc='upper left')
plt.show()
stochastics = stochastics_oscillator(dfPrices.iloc[len(dfPrices.index)-60:len(dfPrices.index)],14)
plt.figure(figsize=(14,7))
#First 100 points due to extreme density
plt.plot(stochastics[0:100],label='Stochastics Oscillator',color='red')
plt.title('Stochastics Oscillator')
plt.legend(loc='upper left')
plt.show()
atr = ATR(df.iloc[len(df.index)-60:len(df.index)],14)
plt.figure(figsize=(21,7))
#First 100 points due to extreme density
plt.plot(atr[0:100],label='ATR',color='green')
plt.title('Average True Range')
plt.legend(loc='upper left')
plt.show()
dfPriceShift = dfPrices.shift(-1)
dfPriceShift.rename(columns={'Average Close Price (USD)':'Average Close Price Target (USD)'}, inplace=True)
dfPriceShift.head(2)
macd = MACD(dfPrices,12,26,9)
macd.rename(columns={'Average Close Price (USD)':'MACD'}, inplace=True)
stochastics = stochastics_oscillator(dfPrices,14)
stochastics.rename(columns={'Average Close Price (USD)':'Stochastics'}, inplace=True)
atr = ATR(df,14)
atr.rename(columns={0:'ATR'}, inplace=True)
final_data = pd.concat([dfPrices,dfPriceShift,macd,stochastics,atr], axis=1)
# Delete the entries with missing values (where the stochastics couldn't be computed yet)
final_data = final_data.dropna()
final_data.info()
final_data
final_data.to_csv('BTC_USD_TechnicalIndicators.csv',index=False)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline
df = pd.read_csv('BTC_USD_TechnicalIndicators.csv')
df.head(2)
dfNorm = (df - df.mean()) / (df.max() - df.min())
dfNorm.head()
num_epochs = 500
total_series_length = len(df.index)
# Sequence Size
truncated_backprop_length = 3
# Number of neurons
state_size = 12
num_classes = 1
num_features = 4
batch_size = 1
num_batches = total_series_length//batch_size//truncated_backprop_length
min_test_size = 100
print('The total length of the series is: {}'.format(total_series_length))
print('The current configuration gives us {} batches of {} observation each, where each one is looking {} steps in the past'.format(num_batches,batch_size,truncated_backprop_length))
dfTrain = dfNorm[df.index < num_batches*batch_size*truncated_backprop_length]
for i in range(min_test_size,len(dfNorm.index)):
if(i % truncated_backprop_length*batch_size == 0):
test_first_idx = len(dfNorm.index)-i
break
dfTest = dfNorm[df.index >= test_first_idx]
dfTrain.head()
dfTest.head()
xTrain = dfTrain[['Average Close Price (USD)','MACD','Stochastics','ATR']].as_matrix()
yTrain = dfTrain['Average Close Price Target (USD)'].as_matrix()
print(xTrain[0:3],'\n',yTrain[0:3])
xTest = dfTest[['Average Close Price (USD)','MACD','Stochastics','ATR']].as_matrix()
yTest = dfTest['Average Close Price Target (USD)'].as_matrix()
print(xTest[0:3],'\n',yTest[0:3])
start_avg_cp_train_trace = go.Scatter(y=xTrain[:,0])
layout = dict(title = 'Train Data (' + str(len(xTrain)) + ' data points)')
fig = dict(data=[start_avg_cp_train_trace], layout=layout)
py.iplot(fig)
start_avg_cp_test_trace = go.Scatter(y=xTest[:,0])
layout = dict(title = 'Test Data (' + str(len(xTest)) + ' data points)')
fig = dict(data=[start_avg_cp_test_trace], layout=layout)
py.iplot(fig)
batchX_placeholder = tf.placeholder(dtype=tf.float32,shape=[None,truncated_backprop_length,num_features],name='data_ph')
batchY_placeholder = tf.placeholder(dtype=tf.float32,shape=[None,truncated_backprop_length,num_classes],name='target_ph')
Since I have considered a 3 layer neural network comprising of:
and output is a result of linear activation of last layer of RNN; we need only a single pair Weight and Bias.
weight = tf.Variable(tf.truncated_normal([state_size,num_classes]))
bias = tf.Variable(tf.constant(0.1,shape=[num_classes]))
# Unpack
labels_series = tf.unstack(batchY_placeholder, axis=1)
Input to RNN
cell = tf.contrib.rnn.BasicRNNCell(num_units=state_size)
states_series, current_state = tf.nn.dynamic_rnn(cell=cell,inputs=batchX_placeholder,dtype=tf.float32)
states_series = tf.transpose(states_series,[1,0,2])
last_state = tf.gather(params=states_series,indices=states_series.get_shape()[0]-1)
last_label = tf.gather(params=labels_series,indices=len(labels_series)-1)
prediction = tf.matmul(last_state,weight) + bias
prediction
mse_loss = tf.reduce_mean(tf.squared_difference(last_label,prediction))
train_step = tf.train.AdamOptimizer(learning_rate=0.001).minimize(mse_loss)
train_mse_loss_list = []
test_mse_loss_list = []
test_pred_list = []
with tf.Session() as sess:
tf.global_variables_initializer().run()
for epoch_idx in range(num_epochs):
print('Epoch {}'.format(epoch_idx))
for batch_idx in range(num_batches):
start_idx = batch_idx * truncated_backprop_length
end_idx = start_idx + truncated_backprop_length * batch_size
batchX = xTrain[start_idx:end_idx,:].reshape(batch_size,truncated_backprop_length,num_features)
batchY = yTrain[start_idx:end_idx].reshape(batch_size,truncated_backprop_length,1)
feed = {batchX_placeholder : batchX, batchY_placeholder : batchY}
# TRAIN
_loss,_train_step,_pred,_last_label,_prediction = sess.run(
fetches=[mse_loss,train_step,prediction,last_label,prediction],
feed_dict = feed
)
train_mse_loss_list.append(_loss)
if(batch_idx % 200 == 0):
print('Step {} - MSE Loss: {:.6f}'.format(batch_idx,_loss))
# TEST
for test_idx in range(len(xTest) - truncated_backprop_length):
testBatchX = xTest[test_idx:test_idx+truncated_backprop_length,:].reshape((1,truncated_backprop_length,num_features))
testBatchY = yTest[test_idx:test_idx+truncated_backprop_length].reshape((1,truncated_backprop_length,1))
feed = {batchX_placeholder : testBatchX,
batchY_placeholder : testBatchY}
# test_pred contains 'window_size' predictions, we want the last one
m_loss,_last_state,_last_label,test_pred = sess.run([mse_loss,last_state,last_label,prediction],feed_dict=feed)
# The last test_pred
test_pred_list.append(test_pred[-1][-1])
test_mse_loss_list.append(m_loss)
train_rmse = sum(item**(1/2.0) for item in train_mse_loss_list)/len(train_mse_loss_list)
print("Mean Training Loss (RMSE) is {:.6f}".format(train_rmse))
test_rmse = sum(item**(1/2.0) for item in test_mse_loss_list)/len(test_mse_loss_list)
print("Mean Testing Loss (RMSE) is {:.6f}".format(test_rmse))
A coefficient of variation (CV) can be calculated and interpreted in two different settings: analyzing a single variable and interpreting a model.
In the modeling setting, the CV is calculated as the ratio of the root mean squared error (RMSE) to the mean of the dependent variable. In both settings, the CV is often presented as the given ratio multiplied by 100.
The CV for a model aims to describe the model fit in terms of the relative sizes of the squared residuals and outcome values. The lower the CV, the smaller the residuals relative to the predicted value. This is suggestive of a good model fit.
The advantage of the CV is that it is unitless. This allows CVs to be compared to each other in ways that other measures, like standard deviations or root mean squared residuals, cannot be.
In the model CV setting: Similarly, the
RMSEof two models both measure the magnitude of the residuals, but they cannot be compared to each other in a meaningful way to determine which model provides better predictions of an outcome.The model
RMSEand mean of the predicted variable are expressed in the same units, so taking the ratio of these two allows the units to cancel. This ratio can then be compared to other such ratios in a meaningful way: between two models (where the outcome variable meets the assumptions outlined below), the model with the smallerCVhas predicted values that are closer to the actual values.It is interesting to note the differences between a model’s
CVandR-squaredvalues. Both are unitless measures that are indicative of model fit, but they define model fit in two different ways:CVevaluates the relative closeness of the predictions to the actual values whileR-squaredevaluates how much of the variability in the actual values is explained by the model.
# Calculate the Mean of the predictions
mean_test_pred = sum(test_pred_list)/len(test_pred_list)
# Calculating the Coefficient of Variation of the predictions
cv_rnn = test_rmse/mean_test_pred
print("Coefficient of Variation (in percentage) for the RNN model is {:.6f}".format(cv_rnn*100))
trace = go.Scatter(
x = np.arange(0,len(train_mse_loss_list)),
y = train_mse_loss_list,
mode = 'markers',
)
layout = go.Layout(
title= "Training Loss",
xaxis=dict(
title='epochs',
),
yaxis=dict(
title='train loss',
)
)
data = go.Data([trace])
fig = go.Figure(data = data, layout = layout)
py.iplot(fig)
avg_cp_trace = go.Scatter(y=yTest, name = 'Average Close Price (USD)', line = dict(color = ('rgb(205, 12, 24)'), width = 4))
pred_avg_cp_trace = go.Scatter(y=test_pred_list, name = 'Predicted Average Close Price (USD)', line = dict(color = ('rgb(22, 96, 167)'), width = 4))
layout = dict(title = 'Average Close Price (USD) vs Predicted Average Close Price (USD)')
fig = dict(data=[avg_cp_trace, pred_avg_cp_trace], layout=layout)
py.iplot(fig)
You can get a deeper understanding of how I scraped the articles and computed their sentiments in my notebooks titled Google News Scraper and Sentiment Analysis of Top Google News Articles for keyword bitcoin.
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.layers import LSTM
df_sentiment = pd.read_csv('bitcoin_news_average_sentiments_2014_2017.csv')
df_sentiment = df_sentiment.drop(["Date"], 1)
df_sentiment.head()
df_cprice_data = pd.read_csv('BTC_USD.csv',usecols=[0,4])
df_cprice_data.set_index("Date")
df_cprice_data.head()
finaldf = pd.concat([df_cprice_data, df_sentiment], axis=1)
finaldf.set_index('Date',inplace=True)
finaldf.head()
avg_cp_trace = go.Scatter(y=dfNorm['Average Close Price (USD)'], x=finaldf.index, name = 'Daily Average Close Price (USD)', line = dict(color = ('rgb(205, 12, 24)'), width = 1))
avg_sentiment_trace = go.Scatter(y=finaldf['Average Sentiment Score'], x=finaldf.index, name = 'Daily Average Sentiment Scores', line = dict(color = ('rgb(22, 96, 167)'), width = 0.5))
layout = dict(title = 'Daily Average Close Price (USD) vs Daily Average Sentiment Score')
fig = dict(data=[avg_cp_trace, avg_sentiment_trace], layout=layout)
py.iplot(fig)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(finaldf.values)
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = pd.DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var{0}(t-{1})'.format (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var{}(t)'.format(j+1)) for j in range(n_vars)]
else:
names += [('var{0}(t+{1})'.format(j+1, i)) for j in range(n_vars)]
# put it all together
agg = pd.concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg
# Look Back period
n_days = 3
# Number of features
n_features = 2
# Total Observations
n_obs = n_days*n_features
finaldf_reframed = series_to_supervised(scaled, n_days, 1)
finaldf_reframed
values = finaldf_reframed.values
n_train_days = 1300
train = values[:n_train_days, :]
test = values[n_train_days:, :]
train.shape
# split into input and outputs
train_X, train_y = train[:, :n_obs], train[:, -n_features]
test_X, test_y = test[:, :n_obs], test[:, -n_features]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], n_days, n_features))
test_X = test_X.reshape((test_X.shape[0], n_days, n_features))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
model = Sequential()
model.add(LSTM(5, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile(loss='mae', optimizer='adam')
history = model.fit(train_X, train_y, epochs=500, batch_size=1, validation_data=(test_X, test_y), verbose=2, shuffle=False)
trloss_trace = go.Scatter(y=history.history['loss'], name = 'Training Loss', line = dict(color = ('rgb(205, 12, 24)'), width = 1))
teloss_trace = go.Scatter(y=history.history['val_loss'], name = 'Testing Loss', line = dict(color = ('rgb(22, 96, 167)'), width = 0.5))
layout = dict(title = 'Training Loss vs Testing Loss')
fig = dict(data=[trloss_trace, teloss_trace], layout=layout)
py.iplot(fig)
# make a prediction
ypred = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], n_days* n_features))
# invert scaling for forecast
inv_ypred = np.concatenate((ypred, test_X[:, -1:]), axis=1)
inv_ypred = scaler.inverse_transform(inv_ypred)
inv_ypred = inv_ypred[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = np.concatenate((test_y, test_X[:, -1:]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]
# calculate RMSE
rmse = np.sqrt(mean_squared_error(inv_y, inv_ypred))
print('Test RMSE: {:.3f}'.format(rmse))
# Calculate the Mean of the predictions
mean_test_pred = np.mean(inv_ypred)
# Calculating the Coefficient of Variation of the predictions
cv_lstm = rmse/mean_test_pred
print("Coefficient of Variation (in percentage) for the LSTM model is {:.6f}".format(cv_lstm*100))
avg_cp_trace = go.Scatter(y=inv_y, name = 'Average Close Price (USD)', line = dict(color = ('rgb(205, 12, 24)'), width = 4))
pred_avg_cp_trace = go.Scatter(y=inv_ypred, name = 'Predicted Average Close Price (USD)', line = dict(color = ('rgb(22, 96, 167)'), width = 4))
layout = dict(title = 'Average Close Price (USD) vs Predicted Average Close Price (USD)')
fig = dict(data=[avg_cp_trace, pred_avg_cp_trace], layout=layout)
py.iplot(fig)
An evaluation of both appproches suggested the first framework to perform better. This could be as a result of a wider range of more accurate unbiased features (Technical Stock Market Indicators) as opposed to the biased feature of sentiment of news articles.
Despite LSTM being an improvisation over the traditional RNN; in my framework it performs worse owing to the lack of reliable features both on qualitative as well as quantitative levels.
In the future I hope to extend my hypothesis to include more relevant features and also consider better hyperparameter optimization techniques to ensure better prediction results based on opinion analysis on a wider range of audience.
\[1\] Reid F, Harrigan M. An analysis of anonymity in the bitcoin system: Springer; 2013.
\[2\] Böhme R, Christin N, Edelman B, Moore T. Bitcoin: Economics, technology, and governance. The Journal of Economic Perspectives. 2015;29(2):213–38.
\[3\] Nakamoto S. Bitcoin: A peer-to-peer electronic cash system. 2008.
\[4\] Kondor D, Pósfai M, Csabai I, Vattay G. Do the rich get richer? An empirical analysis of the Bitcoin transaction network. PloS one. 2014;9(2):e86197 doi: 10.1371/journal.pone.0086197 [PMC free article] [PubMed]
\[5\] Ron D, Shamir A. Quantitative analysis of the full bitcoin transaction graph Financial Cryptography and Data Security: Springer; 2013. p. 6–24.
\[6\] Garcia D, Tessone CJ, Mavrodiev P, Perony N. The digital traces of bubbles: feedback cycles between socio-economic signals in the Bitcoin economy. Journal of the Royal Society Interface. 2014;11(99):20140623. [PMC free article] [PubMed]
[7] Kondor D, Csabai I, Szüle J, Pósfai M, Vattay G. Inferring the interplay between network structure and market effects in Bitcoin. New Journal of Physics. 2014;16(12):125003.
[8] Kristoufek L. BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era. Scientific reports. 2013;3. [PMC free article] [PubMed]
[9] Kristoufek L. What are the main drivers of the Bitcoin price? Evidence from wavelet coherence analysis. PloS one. 2015;10(4):e0123923 doi: 10.1371/journal.pone.0123923 [PMC free article] [PubMed]
[10] Yelowitz A, Wilson M. Characteristics of Bitcoin users: an analysis of Google search data. Applied Economics Letters. 2015;22(13):1030–6.